Skip to content

Conversation

eggrobin
Copy link
Member

As for ^, - is treated normally at the beginning of a UnicodeSet and as a set elsewhere.

Again, this is madness and we should probably get rid of this possibility, but for now let us make sure we know exactly which way madness lies.

Checklist

  • Required: Issue filed: ICU-23179
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@eggrobin
Copy link
Member Author

eggrobin commented Sep 9, 2025

@markusicu, friendly ping (#3604, the UnicodeSet parser rewrite, is waiting behind this one).

@eggrobin eggrobin merged commit dce38b5 into unicode-org:main Sep 11, 2025
94 checks passed
@macchiati
Copy link
Member

As for ^, - is treated normally at the beginning of a UnicodeSet and as a set elsewhere.

I didn't understand "as a set elsewhere". Did you mean is treated as syntax to form a range elsewhere? Eg [a-b]?
BTW I think we only had the "is normal at the start" to match regexes.

@eggrobin
Copy link
Member Author

I didn't understand "as a set elsewhere". Did you mean is treated as syntax to form a range elsewhere? Eg [a-b]?

The context here is that the symbol table can map characters to sets, e.g., it can say “0 means [a-z]”. And this can be done to syntax characters, e.g., “- means [1-9]”. The effect of mapping syntax characters varies, and this is what this pull request tests. Obviously this is madness and we should disallow the mapping of syntax characters in a future version of ICU, but for now I am testing exactly what madness happens so I can preserve it in my rewrite of the parser.

@macchiati
Copy link
Member

macchiati commented Sep 11, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants